This aggregated cheatsheet of our tutorial package Introduction to programming with R summarizes the individual notes from the provided tutorials and additional information from recommended sources.
== operator tests for equality of things;
!= for inequality= is used for assignment of values to arguments (don’t
use for object assignment!)TRUE/FALSE
values
&
(AND), | (OR), and ! (NOT)() after their namec() combines/concatenates multiple values into one
vector objectinstall.packages() installs a new package via its name
from the console (also possible via ‘Packages’ menu in RStudio)ggplot2 for visualizationdplyr for data transformationtidyverselibrary() loads a package via its name
help() provides details information on functions, data
sets, packages
? short versionhead() first elements of objectglimpse() (from dplyr package) compact
variable view of tabular data
+, <-,
=, …),
], }, or closing quote, i.e. ",
'ggplot2 is the visualization package
ggplot() defines what data to plot (and general/shared
aesthetics)
data = first argument = what tidy data table
to visualizegeom_...() defines how to plot the data, i.e. the plot
geometry, e.g.
geom_point() (Summary
page)geom_line() (Summary
page)geom_smooth() (Summary
page)+ combines different geometries, labels, etc. into one
plot
ggplot(mtcars, mapping=aes(x=wt, y=mpg)) + geom_point() + xlab("weight")mapping argument takes aes()
aesthetics, which defines what data variables from the
input table are used where in the plot, i.e. their
mapping (Summary
page)
ggplot(): applied to all subsequent
geom_*() functions
ggplot(mtcars, mapping=aes(x=wt, y=mpg)) + geom_point()geom_point(): only used when doing
this point drawing
ggplot(mtcars) + geom_point(mapping=aes(x=wt, y=mpg)). and _myname is not
myName<- for object assignment (RStudio hot key =
‘ALT’ + ‘-’)seq() generates a vector of subsequent numbers,
e.g. seq(2,5) is equal to c(2,3,4,5)'' or "", both worksx is equal to print(x)
() triggers
printing implicitlytibble = data structure to store tabular data
(tidyverse extension of data.frame structure)
p2 <- tibble( x = 0:3, y = c(1,2,4,8)) (as done for
data.frame or list)
p2 <- tibble( x=0:3, y=2^x)p2 <- tribble( ~x, ~y, 0, 1, 1, 2, 2, 4, 3, 8) (put
line breaks where appropriate)$colName = dollar + name-based
(most recommended), e.g. p2$y[["colName"]] = double-squared + quoted name
as done for lists, e.g. p2[["y"]][[2]] = double-squared + index-based (try to avoid),
e.g. p2[[2]]
[] = single-squared + index,
not a column vector!table[["strange name"]] - quotes in double-bracket
usagetable$`strange name` - back-ticks in direct name-based
accessglimpse(p2) to check column names and resp. data
typesas_tibble() converts an object into a
tibble (if possible)is_tibble() checks whether it is a
tibblestr()provides the data structure of an R object,
i.e. how it is organized and what’s insidec() (concatenate)TRUE or FALSE)
==, !=, >,
<, <=, >=&, |, xor()!%in% vectorbetween() (dplyr)near() better for floating point number comparison than
==filter() prunes table to rows of interest (Summary
page)
TRUE
pass the filter and are kept; all others are removed&%>% piping for connection of data transformation
steps
summary() statistical overview of values in data
structure; e.g. for each variable of a data framearrange() changes the order of rows (Summary
page)
desc() around a variable triggers descending
sorting w.r.t. the variable’s valuesNA is always lastselect() reduces the columns to variables of
interest (Summary
page)
c())
: generates a vector of ascendingly increasing elements
(from:to) including the boundarieseverything() : all variables not named so far (useful
for reordering of columns)! or - negation removes the specified
variables (also works for vectors via c() or
:)starts_with(), ends_with(),
contains() : parts of variable namesmatches() : regular expression on names (discussed
later)num_range() : combines strings and number vectors to
full names like x1, x2, …any_of() : listed variable names not necessarily
present in data table (useful for negative selection without
errors/warnings)mutate() creates new variables
(columns) (Summary
page)
NAME = EXPRESSIONmutate() calls
transmute() creates a new table with
new variables (dropping the input table)
A-B, respective A and
B values from each row from both columns are subtracted
(creating a new vector with the same length as both columns; one result
for each row in the same row-order)+,
-, *, /, ^ (power),
log..()%/% and
%%<, ==,
!=, >=, …TRUE/FALSE):
&, |, !cum...() = rolling …, e.g.
cumsum()/cumprod = rolling sum/product of
consecutive values (= vector of values)cummin()/cummax = min/max of all rows up
to herecummean()min_rank() or dense_rank() (typically one
of both is what you are looking for)percent_rank(), row_number(), …n rows
lag() = previous values, i.e. shifting the column down
(adding NAs at the beginning)lead() = following values, i.e. shifting the column up
(adding NAs at the end)sum() = sum of all values in the variable column (=
single value)mean()c(1,2,3,4)+1 and
c(1,2,3,4)+c(1,2) are working while
c(1,2,3)+c(1,2) is notsummarize() produces only a single output row (per
group) (Summary
page)
n()group_by() decomposes the observations (Summary
page)
group_by() calls overwrite the grouping of
the previous!ungroup() explicitly undoes the previous
group_by() and merges the subtables back into onemean(), median(), quantile(),
sd()min(), max(), first(),
last(), nth()n(), n_distinct()sum()
sum() of logical values == number of TRUEs
(since scored 1, where FALSE is scored 0)mutate(),
filter(), …NA handling important (set na.rm=TRUE
argument if wanted)count() is a summary based on counting or summation
onlyrename() allows to alter variable names (Summary
page)
<NEWNAME> = <OLDVAR> where
OLDVAR can be
colnames() give you the vector of a table’s column
names (base R)slice functions extract specific rows (Summary
page)
slice() - columns specified by indicesslice_min(), slice_max() - best/worst rows w.r.t.
values in a given variable
with_tie=F to get only first min/max rowslice_head(), slice_tail() - top/tail rows w.r.t.
current row orderslice_sample() - pick at randomn=.. argument for number of rowsprop=.. argument to specify a fraction of rows to
keepdistinct() reduces the table to unique
observations (Summary
page)
.keep_all=TRUE no columns are removed from the
outputjoin functions allow to fuse information from multiple
tables into one
by = X - same column name X in both
tablesby = c(X = Y) - merge based on X column in
first table and Y variable from second tableNA entriesleft_join() keeps all rows from first tableinner_join() - only observations with values in both
tablesfull_join() - all vs. all combinationsdplyr package covers even more verbs,
see
readr package (Summary page)
read_delim() - columns are separated by a single letter
given to delim argument
read_csv() - using , as column
delimiterread_csv2() - columns delimited by ; and
German ,-based decimal number notationread_tsv() - tabular as column delimiterread_fwf() - fixed width column specification (fixed
number of letters per column)locale argument defines language specifications like
decimal separator, names of days/months, date/time encoding, letter
encodings, …
locale=locale(decimal_mark=",", grouping_mark=".")
= German number notationlocale=locale(encoding="latin1") = using the
Latin-1 letter encoding typically used on Microsoft Windows systems to
enable a correct rendering of umlauts etc.quote='"' to specify quotation lettercol_types=list(colName="c") allows to specify the data
type of (individual) columnsna='--' takes a vector of (string) values to be treated
as not available (NA).gz,
.bz2, .xz)R
session defines from/to where to read/write files (enables
relative path specification!)
getwd() and setwd()write functions
write_csv(DATATABLE, FILENAME)write_csv2() if you need a , decimal
separator (no manual locale specification possible!)readxl for MS Excel import (Summary page)writexl for (simple) MS Excel export (Summary page)googlesheets4 for import/export of online Google Sheets
(Summary page)tidyr package (Summary page)
pivot_longer() = column names go to “names” column (and
values to one column)pivot_wider() = values from one column are distributed
over multiple columnsNA handling
fill() replaces NA with
previous/subsequent non-NA value in columncomplete() adds implicitly missing value
combinationsdrop_na() removes rows with NAreplace_na() new value for NA entriesseparate() decomposes a variable’s text values into
multiple columnsunite() joins multiple variables into one text
columnStrings
' or double
" quotes)\ of special characters
like line-break \n, tabulator \t, …\uXXXX for UTF-16 and \U.. for UTF-32
encodings), e.g. "greek alpha = \u03B1"writeLines() : final text output in the console
(e.g. for testing/checking)str_length() : number of characters/lettersstr_c() or paste(): concatenation of
strings
sep stringcollapse
stringstr_sub(): extracting a substring, i.e. part,
of a string
start and end indices are
inclusivestr_to_lower(), str_to_upper(),
str_to_title() : capitalization conversion
locale for language specifics _
str_sort() : lexicographic sortinglocale specificstr_trim()/str_pad() : remove/add
whitespaces at strings’ endsRegular expressions
. : any letter (but no newline per default)\\d (\\D) : (not a) digit\\s (\\S) : (not a) whitespace,
i.e. space, tabulator, ..\\w : (english) word letter or digit[] : explicit letter list (use [^] for
negation of list)^ : beginning of the text$ : end of the text\\b : end of a word (left or right)| alternative separator, e.g. a|b matches
a or b() to group or
capture blocks of a pattern, e.g. to
go (left|right)(['"])[^'"]\1 to match a text that is quoted by
single OR double ticks but using the same tick at the beginning and
end? : 0 or 1 times* : 0 or multiple times+ : 1 or multiple times{} : explicit counts {3}, ranges
{2,4} or “at least/most”
{2,},{,3}str_view("aaaa","a+") reports the whole
string as match and not only one letter!(ab)+ matches "ababab"str_view() shows pattern matches within the input
strings (e.g. for regex testing)
match=T : show hits only (also useful to list
not-matched strings)str_subset() provides the subset of input elements (a
vector of strings) that match the given regex,
e.g. str_subset(c("a","b","c"), "a|b")str_detect() : logical T|F if a
string contains a patternstr_subset() : all elements that contain a
patternstr_count() : number of pattern occurrences
per stringstr_extract() : only overall match of the
search patternstr_match() : overall match + each group match
of the search patternstr_replace() : replacement of matches
str_replace("A B", ".*(\\S+)$", "last word was '\\1'")str_split() : decompose into multiple strings
simplify=TRUE returns a matrix rather than
a listn=.. number of final pieces, i.e. matches(+1) used for
decompositionboundary() : general regex for matching of
boundaries of words, sentences, …str_locate(): position of the match
(start+end).._all() variants : returns all
matches not only the firstAll regular expression strings are processed by
regex(), which can be further constraint if needed,
e.g. using
ignore_case : case insensitive matchingmultiline : treats \n like string ends
(for ^ and $ matching)dotall : will make . also match linebreaks
\n and other stuffAlso important:
intersect() provides the elements shared among two
vectorsunlist() merges all elements of a list
into a single vectorunique() provides the unique elements of a
list or vectormatrix is like a table-version of a vector
(all elements same data type)vector = all elements are same data typelist = most general, can contain anything, any size,
…data.frame/tibble = list of vectors (of same lengths)
[] = reduces the current object to the selected part(s)
(same data container)[[]] = provides a single element
(element-specific data container)$ = shortcut for [[]] with namefor or
while loops)
for( VARIABLE in DATA ) {...}DATA is one by one
stored in VARIABLE before running {...}for( d in list( x="haha", y=1:3)) { print(length(d)) }if/else
statements)
if ( CONDITION ) {..T..} else {..F..}else {..B..} is optionalCONDITION must evaluate to a single
TRUE|FALSE: triggers execution of respective
{..T|F..} block
if ( 1:4 == 2 ) {} not working since check returns
four logic values!if ( version$os == "mingw32" ) { print("MS Windows user?") }function
definition and usage)
myFunction <- function (ARGUMENTS) {...}ARGUMENTS are optional (first should be the data to
work on)myFunction( .. ) using appropriate values for
ARGUMENTSreturn() in {...} (default:
last “printed” value is returned)list() or
vector c()for loop generalizationfunction
generalizationseq_along() to generate the list of valid
indices of a vector or listnames() to access the vector of element
names of an objectThis work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Feel free to contribute at https://github.com/Dr-Eberle-Zentrum/Introduction-to-programming-with-R.